Webclass: Web Document Classiication Using Modiied Decision Trees
نویسنده
چکیده
Searching for Web sites is one of the most common tasks performed on the Web. Web page classi cation is the rst step for Web search service construction. This paper proposes a system, named WebClass, for classifying Web documents by using a height-three modi ed decision tree which splits the root, depth-one nodes, and depth-two nodes on the keywords, descriptions, and hyperlinks, respectively. Start a URL at the root of the decision tree and trace paths downward to leaves, which give the categories the URL belongs to. A comparison of manual classi cation to WebClass shows the later achieves over 73% accuracy of human classi cation.
منابع مشابه
Language Independent Named Entity Classi cation by modi edTransformation - based Learning and by Decision Tree
We describe our last results at the CoNLL2002 shared task of Named Entity Recognition and Classiication using two approaches that we rst applied to other NLL problems. We have been developing our own modiied TBL learner initially to tackle the Part-of-Speech tagging problem, for integration in a hybrid NLL and rule-based system for information extraction (Ciravegna et al., 1999). After encourag...
متن کاملUsing Causal Knowledge to Learn More Useful Decision Rules From Data
One of the most popular and enduring paradigms in the intersection of machine-learning and computational statistics is the use of recursive-partitioning or \tree-structured" methods to \learn" classiication trees from data sets (Buntine, 1993; Quinlan, 1986). This approach applies to independent variables of all scale types (binary, categorical, ordered categorical, and continuous) and to noisy...
متن کاملUsing Model Trees for Classiication
Model trees, which are a type of decision tree with linear regression functions at the leaves, form the basis of a recent successful technique for predicting continuous numeric values. They can be applied to classiication problems by employing a standard method of transforming a classiication problem into a problem of function approximation. Surprisingly, using this simple transformation the mo...
متن کامل{24 () Parallel Formulations of Decision-tree Classiication Algorithms
Classiication decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud detection, etc. Highly parallel algorithms for constructing classiication decision trees are desirable for dealing with large data sets in reasonable amount of time. Algorithms for building classiication decision trees have a natural concurrency, but are diicult to ...
متن کاملPreliminary Investigations into Interactive Classiication in Description Logics
Interactive classiication in description logics is the process of querying a user to obtain information about an individual in a description logic knowledge base. A complex formalization of this process takes into account the expected costs of the entire decision tree to determine the best placement of the individual. A simpler, and easier to compute , formalization uses costs only for determin...
متن کامل